r 

1 



DOCUMENT RESUME 



ED 236 191 

AUTHOR 
TITLE 

INSTITUTION 

SPONS AGENCY 

REPORT NO 
PUB DATE 
CONTRACT 
NOTE \, 
PUB TYPE 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



TM 830 700 

Tindal, Gerald; And Others 

The Technical Adequacy of a Basal Reading Series 
Mastery Test, 

Minnesota Univ. , Minneapolis. Inst, for Research on 
Learning Disabilities. 

Office of Special Education and Rehabilitative 

Services (ED), Washington, DC. 

IRLD-RR-113 

Apr 83 

300-80-0622 

43p. ^ 
Reports - Research/Technical (143) 

MF01/PC02 Plus Postage. 

*Basal Reading; Criterion Referenced Tests; Grade 6; 
Intermediate Grades; *Mastery Tests; Measurement 
Techniques; Reading Research; *Reading Tests* *Test 
Reliability; *Test Validity 

*Houghton Mifflin Reading Series; SRA Diagnostic 
Reading Tests; Word Reading Test 



ABSTRACT 

The purposes of this study were to examine the 
reliability and validity of a basal reading series mastery test, and 
to explore the appropriateness and usefulness of two stratiegies for 
investigating the reliability and validity of criterion-referenced 
tests. Subjects were 47 sixth graders, who were tested on the SRA 
Reading Achievement Test, the Houghton-Mifflin End-of-level 11 Basic 
Reading Test (BRT) , and the. Word Reading Test. A subgroup of 20 
children was tested a second time on the BRT. Traditional 
psychometric correlational analyses as well as specific strategies 
for examining the adequacy of criterion-referenced tests were applied 
to the data to investigate the following dimensions of the technical 
adequacy of the BRT: (1) consistency o? student performance across^ 
two administrations of the BRT, and (2) criterion validity of the BRT 
scores with respect to two other measures of reading proficiency and 
criterion validity of the BRT mastery/nonmastery decisions with 
respect to pre^post instructional status. Results indicated that the 
reliability and validity of the BRT was less than adequate, and that 
both strategies for investigating the adequacy of a 
criterion-referenced test were useful and provided complementary 
information. Implications for the development and use of 
criterion-referenced instriiments are discussed. (Author) 



*********************************************************************** ^ 

* Reproductions supplied by EDRS are the best that can be made * 

* from the original document. 

*********************************************************************** 

ERIC ' 



ISTi University of Minnesota 



Research Report No. 113 



THE TECHNICAL ADEQUACY OF A BASAL READING 
SERIES MASTERY TEST 



Gerald Tindal, Mark Shinn, Lynn Fuchs, Douglas Fuchs, 
Stanley Deno, and Gary Germann 




SCOPE OF INTEREST NOTICE 
The ERIC racllity has euigned 
thJi ck>curnent for proceiiing 

- -riAA 



In our judgement, thli documeot 
Is also of Interest to the dearlng- 
housei noted to the right/Index- 
ing should reflect their special 
points of view. 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



J.y 



4 S 

fO 



■ '"id : 
ERIC 




TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



U.S. DEPARTMENT OF EDUCATION 

NATIONAL INSTITUTE OF EDUCATION 

EDUCATIONAL RESOURCES INFORMATION 
CENTER (ERIC) 
This document has beien reproduced as 
received from the person or organization 
originating it. ^ ' , . 

□ Minor changes have been made to improve 
reproduction quality. 

• Points of view or opinions stated in this docu- 
ment do not necessan'ly represent officisi NIE 
position or policy. - . 



Director: James E. Ysseldyke <f 



The Institute for Research on Learning Disabilities is supported by 
a contract (300-80-0622) with, the Office of Special Education, Depart- 
ment of Education, through Title VI -G of Public Law- 91-230. Institute 
investi^gators are conducting research on the assessment/decision-making/ 
intervention process as it relates to learning disabled students . 

During 1980-1983^ Insti tute research focuses on four major areas: 

• 'Referral 

• Identification/Classification 

• Intervention Planning and Progress Evaluation 
• ^ Outcome Evaluation 

Additional information ox\ the Institute's research objectives and 
activities may be obtained by writing to the Editor at the Institute 
(see Publications list for address). 



The research reported herein was conducted under government spon- 
sorship. Contractors are encouraged to express freely their pro- 
fessional judgment in the conduct of the project. Points of view 
or:opinions stated do not, therefore; necessarily represent the 
official position of the Officeof Special Education. 



Research Report No. 113 

-THE TECHNICAL ADEQUACY 01= A BASAL READING 
SERIES MASTERY TEST 

Gerald Tindal , Mark Shinn, Lynn Fuchs, Douglas Fuchs, 
Stanley Deno, and Gary Germann .. 
Institute for Research on Learning Disabilities 

G' 

University of Minnesota • 

^ ■■ ■ ' - - - 



4 

■• April, 1983 

4 



ERIC 



4 



Abstract 

The purpo$es of this study were to (a) examine the reliability 
and validity of a basal reading series mastery tes.t, and (b) explore 
the appropriateness and usefulness of two strategies for investigating 
the reliability and validity of criterion-referenced ' tests. Subjects 
were 47 sixth graders, who were tested on the SRA Reading Achievement 
Test, the Houghton-Mifflin End-of-level 11' Basic Reiading Test (BRT), 
and the Word Reading Test. A subgroup of 20 children was tested a 
second time on the BRT. Traditional psychometric correlational 
analyses as well as specific strategies for examining the adequacy, of 
criterion-referenced tests were applied to the data to investigate the 
following dimensions of the technical adequacy of the BRT: (a) 
consistency of student performance across two administrations of the 
BRT, and (b)' criterion validity of the BRT scores with respect to two 
other measures of reading proficiency and criterion validity of the 
BRT mastery/nonmastery decisions with . respect to pre/post 
instructional status. Results indicated that the reliability and 
validity of the BRT was less than adequate, and that both strategies 
for investigating the adequacy of a criterion-referenced test were 
useful and provided complimentary information. Implications for the 
development and use of criterion-referenced instruments are discussed. 



t . 

The Technical Adequacy of a Basal Reading 
Series Mastery Test 

With the growing demand for accountability in the schools, the 
focus on educational tests has expanded. Norm-referenced achievement 
testing, the traditional measurement format, is the predominant 
measurement strategy for evaluating and documenting program effects. 
Concurrent with its frequent use, however, is growing recognition that 
norm-referenced measurement may be inadequate for its intended 
purposes: It has poor content validity with respect to classroom 
curricula, and it fails to indicate the extent to which individuals or 
groups have mastered specific educational objectives (Skager, 1971). 

As an alternotive to traditional edu'-ational measurement, 
criterion-referenced (CR) test^ing has received greater attention in 
the past two detades by measurement theorists, test developers, and 
school personnel. As conceptualized by Glaser and Nitko (1971), the 
CP test is a sample' of items yielding information that is 
interpretable directly with respect both to a well-defined domain of 
tasks and to specified performance standards. This definition 
reflects three characteristics that frequently are employed in the 
literature to describe CR measurement: (a) definition of a well- 
specified content domain (Baker, 1974; Hambleton & Novick, 1973; 
Millman, 1974), -(b) delineation of valid performance criteria 
(Hambleton, 1980), and (c) development of procedures for generating 
appropriate samples of tests (Goodstein, 1982; Hambleton, Swaminathan, 
Algina,.& Coulson, 1978; Popham, 1980). All three components stress 
the edumetric and psychometric properties of CR tests. 

Nevertheless, the focus both "in publishing" houses and in the 



ERIC 



6 



2 

schools has been more utilitarian. With the recognition that CR tests 
provide relevant data for describing student progress with respect to 
specif>ic learning objectives, their use has prolifcated. Test 
developers have marketed CR instruments along with objective banks; 
commercial curriculum writers have published CR tools for assessing 
mastery within t'leir series; school districts have created their own 
CR tests; and teachers have developed such instruments to fit 
individual learning objectives. Unfortunately, there-has been a lack 
of concommitant investigation of the reliability and validity of these 
tests. 

Therefore, although two measurement formats currently, are 
available and used in education'aJ settings, neither is adequate for 
evaluating the effects of instructional programs. While norm- 
referenced tests frequently demonstrate several strong psychometric 
characteristics, they lack content validity and utility. Alternately, 
CR instruments are isomorphic with respect to classroom curricula and, 
as such, appear very useful; however, there "is little evidence that 
such measurement is accurate or meaningful. 

The current study addressed part of this dilemma by beginning the 
task of investigati/ig the reliability and validity of available CR 
tests. Traditional ways of assessing such adequacy, however, have 
been criticized as largely inappropriate for CR instruments (Popham & 
Husek, 1969). Hambletori and Novick (1973) reasoned that, because one 
of the purposes of a CR test is to identify mastery within a domain, 
test variance typically is small.. Homogeneous distributions of test 
scores are centered at the low aad high ends of the measurement scale. 



re^^pectively representing pre and post-instruction performance 
(Hambleton & Novick, 1973). When the variance of test scores is 
restricted in this way, correlational estimates of reliability and 
validity tend to be low. In response to this problem, alternative 
analyses for investigating the adequacy of CR tests have been 
developed (Berk, 1980); in contrast to the correlation statistic, 
these analyses rely minimally on the notion that inter-individual 
variability is necessary (Carver, 1970; Hambleton & Novick, 1973; 
Huynh, 1976; Subkoviak, 1975). 

Despite the development of such analyses, it appears that 
developers of commercial CR instruments, if they address technical 
adequacy at all, still rely predominantly on traditional psychometric 
correlational analyses. Inspection of eight commercial criterion- 
referenced instruments and four basal mastery lests revealed that (a) 
only one-third of the test manuals addressed reliability and validity 
at all , and (b) only traditional analyses were employed in the 
investigations of the instruments* technical adequacy (see Table 1). 



Insert Table 1 about here 



In the present study, both traditional correlational statistics 
and alternative CR approaches were employed to examine the adequacy of 
one CR instrument developed and published by a reading series company. 
The purpose of this study was twofold. First, the investigation was 
designed to contrast results based on the traditional and alternative 
approaches to studying the technical adequacy of CR instruments. Such 

8 ' . ■ 



a contrast should shed light on the appropriateness and potential 
usefulness of each strategy. The second pijrpose was to describe the 
reliability and validity of the specific CR measure examined. Despite 
widespread use of this test, there are few, if any, reports concerning 
its adequacy.! The .. investigation of the test's reliability and 
validity should provide information of interest not only to consumers 
of this measure but also to users of other CR tests for which 
technical data also are still unavailable. 

Method 

Subjects 

Subjects were 47 students (20 M, 27 F) from two sixth , grade 
classes. Each class represented a school district within a rural 
midwestern educational cooperative. The students' mean reading 
percentile rank was 51.48 (SD = 18.11) as measured on the Science 
Research Associates (SRA) Reading Achievement Test. 
Measures 

Three measures of reading performance were used in the study: a 
basal series criterion-referenced test, a glbbal norm-referenced test, 
and a curriculum-based word reading test... 

Criterion-referenced test . Three scales of the End-of-level 11 

Basic Reading Test (BRT; Brzeinski & Schoephoerster, 1974) of the 

Houghton-Mifflin basal reading series were employed as measures. Each 

of the three scales. Decoding Skills, Comprehension Skills, and 

Reference/Study Skills is -comprised of several subtests. Table 2 

/I ■ . « 

lists the subtests constituting each scale and provides brief 

descriptions of tasks the examinee "is required to db within each 



- 5 

subtest. This BRT Is designed as a criterion-referenced test, with 
items per subtest ranging from 6 to 12 and with mastery-nonmastery 
cutoff scores established at to 85?^ correct responses. 



Insert Table 2 about here 

" / 

/ 

Norm-referenced test . The Science Research Associates (SRA) 
Reading Achievement Test (Naslund, Thorpe, & Lefever, 1978) is 
comprised of two subtests: vocabulary and comprehension. In the 
vocabulary section, examinees are required to select, from four 
alternatives, a synonym for an underlined word in a sentence. In the^ 
comprehension section, examinees read 200-300 word passages and answer 
questions in a multiple choice format .y^Total test score is based on a 
linear combination of the two subtests. Internal consistency 
reliability was reported at '.88 "(Salvia & Ysseldyke, 1981). 

Curriculum-based word reading test . The Word Reading Test (Deno, 
Mirkin, & Chiang, 1982) requires children to read aloud passages and 
isolated word lists and is scored in terms of average numbers of words 
correct and incorrect over two alternate forms of the Isolated Word 
Reading and Passage Reading scales. The 200-word passages are drawn 
randomly fronj a student's grade-appropriate level basal reading book; 
the 150-word lists sample words randomly from -basal s,. with 60?J of 
words drawn from the student's grade-appropriate level and 40?^ sampled 
equally from all previous levels. For the passage and isolated Word 
Reading Test, test-retest and alternate form reliabilities were at 
least .90 (Fuchs, Deno, & Marston, in press;. Fuchs, Wesson, Tindal, 



6 

Mirkin, & Deno, 1981). 
Procedure 

All students were tested in groups, by a school psychologist tor 
the SRA Reading Achievement Test, and by their classroom teachers for 
the BRT. The Word Reading Test was administered individually by 
trained aides. Standardized administration procedures /ore followed 
on all tests. Testing time ranged from 60 to 90 minutev* for the SRA 
test, 60 to 90 minutes for the BRT, and five to six minutes for the 
Word Reading Test. All testing was completed within a two-week 
period. 

' To assess test-retest reliability quest ions-, a subgroup of 20 
students 9 F) was administered the measures in the following 

order: BRT, SRA Reading Achievement Test, Word Reading Test, and BRT 
again. For the remaining, 27 students, each measure was given one 
time, with the order of adiTiinistrat ion random. 
Data Analysis 

Cons i st ency of performance ort two admi n i strat i ons of the same 
test . Consistency of students' performance on the BRT was assessed in 
three ways. In all three analyses, the students who had been tested 
twice on* the BRT (N=20) were the subjects. First, traditional test- 
retest reliability was determined by correlating scores from the two 
administrations of the BRT. The other two analysis strategies were 
designed specifically for criterion-referenced measures (see Millman, 
1974). In the first of these, consistency of students' subtest ' scores 
was determined by (a) computing individuals' percentage correct score 
on each subtest for each administration of the BRT, (b) calculating 

11 



for each Individual his/her difference score across the two 
administrations of each subtest, and (c) determining the percentages 
of examinees having each possible difference Score on each subtest. 
In the second strategy, consistency of mastery-nonmastery decisions' on 
subtests was determined by dividing the dif feren'i:'6 between 'observed 
and chance proportions of agreements In decisions by the maximum vail ue 
that difference could assume. (The chance proportion of agreements 
was computed by multiplying and then summing the marginal proportions 
of the same decision categories for the two administrations, as done 
in a chi-square test of association.),. 

Criterion validity . The criterion validity of the BRT was 
determined in two ways, employing the entire group of subjects (N=47). 
The traditional psychometric strategy of correlating scores on the 
measure of interest (BRT) with criterion measures was used. The SRA 
Reading Achievement Test and the Word Reading Test were employed as 
the criterion measures. AdditionaHy, chi-square statistical tests 
were applied to contingency tables wherein mastery-nonmastery 
represented one dimension of each table and pre-post instructional 
status represented the other dimension. Percentages of 
misclassifications supplemented the chi-square tests. 

Results 

/Table 3 is a display of students* mean scores and standard 
deviations on each subtest of the BRT, on each subscale and the total 
• of the SRA Reading Achievement Test,^ and on the isolated word reading 
and passage reading scales of the Word Reading Test. 



Insert Table 3 about here . 

Consistency of Performance on Administrations of the Same lest 

Test-retest rfeliability correlations on subtests of the BRT are 
;display.ed in labile 4. For the decoding subtests, correlations' were 
low, ranging ^ from .20 to ,42; for the comprehension subtests, 
correlations were low to moderate, ranging from .03 to .83; and for 
the study/reference skills subtests, correlations. were high, ranging 
between .86 .and .94. - 

Ijisertt-Table 4 about hisre . ; • 

The second analysis of the consistency of performance involved 
caictiiating the percentages of examinees who had different percentage 
correct scores across the two administrations of the BRT. Figures 1-4 
are graphic displays of the percentages ^^f examinees displaying 
various difference scores on each subtest of the BRT; Table 5 
•summarizes the information illustrated on the graphs. The range of 
difference scores on the subtests fell between 0 and 83%. The 
^-percentage" of examinees with ^ 0% difference scores on two 
administrations ranged from 22 on an information appraising subtest to 
85 on the word attack subtest. Across the decoding subtests, the mean 
percentage of examinee^ with 0?^ differences scores was 65 (SD = 
28.28); across the comprehension subtests, the mean percentage was 
57;20 (SD =14.96); across the study/reference skills subtests, the 



9 

mean percentage was 51,25 (SD = 18.76); and. across all the subtests, 
the mean percentage was 55.07 (SD = 17.92). 



Insert Figures l-A'^ and Table 5 about here 



The thine) analysis of the consistency of performance addressed 
consistency of mastery-nonmastery decisions across the two 
administrations of the BRT. Table 6. is a display of the uncorrected 
and corrected proportions of examinees placed into the same decision 
category on the two administrat ioas. On the decoding subtiests. the 
corrected proportions are low, with the proportion, of agreement on the 
Word Attack subtest 6% lower than chance and the proportion of 
agreement on the Pronunciation subtest only 18% greater than chance. 
On the comprehension subtests, the proportions of agreement were quite 
variable, ranging from IS% lower than chance to 88% greater than 
chance. On the study/reference skills subtests, proportions of 
agreement were moderate to high, ranging from 51% to 78% greater than 
chance. 



Insert Table 6 about here 



Criterion Val idity ^ 

Correlational analyses were conducted between the BRT subtests 
and two criterion measures, the SRA Reading Achievement Test and the 
Word Reading Test. Correlations between the BRT subtest and the SRA 
subscale and total test scores^ are displayed in Table 7. They ranged- 



from .35 to .73 when SRA vocabulary subscale scores were involved, 
from .19 to .70 when SRA comprehension subscale scores were employed, 
and from .26 to .75 when SRA total scores were used. The average 
correlation for ^BRT decoding subtests was .41 (SD = .02); for BRT 
comprehehsion subtests, the average correlation was .52 (SD = .21), 
and for BRT study/reference skills subtests, it was .57 (SD = .07). 



Insert Table 7 about here 



Correlations between the BRT subtests and the Word Reading Test 
subscale scores are displayed in Table 8. They ranged from .27 to .57 
when isolated word reading scores were involved,.^ and from .31 to .68 
when passage reading scores were employed. The mean correlation for 
the BRT decoding subtests was .34 (SD = .08); for the BRT 
comprehension subtests, the mean correlation was .47 (SD = .13), and 
for the BRT study/reference ski lis' subtests, it was .56 (SD= .06). 



Insert Table 8 about here 



Criterion validity also was examined by inspecting the relation 
between mastery-nonmastery decisions on the BRT and actual pre-post 
instructional status. Relevant chi-square values, £-values, and 
percentages of misclassified students are displayed in Table 9. 
Across the decoding subtests of the BRT, the average percentage of 
misclassified students was 40.50 (SD = 3.54); across the comprehension 
subtests, the average percentage was 39.00 (SD = 4.58), across the 



11 

study/reference skills subtests, it was 23.33 (SD = 8.51), and across 
all the subtests, it was 33.50 (SD = 9.99). 



Insert Table 9 about here 



> — —Discussion — : " 

The purpose of the current study was twofold, 'first, the study 
was designed to describe the reliability and validity of a criterion- 
referenced mastery test of a basal reading series. Second; by 
examining this reliability" and validity, both- with traditional 
corr-^l^tional analyses and with alternative strategies developed 
speci.iCally for criterion-referenced instruments, this investigation 
sought to contrast results and assess tjie appropriateness and 
potential usefulness of each strategy. 

• With respect to its firsi? purpose, the study, ex'amined two aspects 
of. the technical adequacy of the Houghton-Mifflin End-of-level 11 
Basic Reading Test: the consistency of students performance on two 
administrations of the test, and the criterion validity of the test. 
On both of these indices, the Houghton-Mifflin BRT appeared 
inadequate. . . 

Test-retest reliability coefficients indicated that, when the BRT 
was administered twice within a short time interval, . students* 
performance was very inconsistent on the decoding subtests; none of 
the correlations obtained for the decoding subtests even fell within 
"the acceptable range for making group decisions (Salvia & Ysseldyke, 
1981). On the comprehension subtests, ^correlations' were poor to fair. 



with the correlation for only one subtest, Meaning Acquisition, 
falling into the acceptable range for group decision making and with 
none of the correlations high enough for making decisions about 
individual students. On the study/reference ski lis. subtests, however, 
student perforrnance.-was_more- consi stent, —with-all-correlatib^^^^^^ or 

better. 

Resu^lts of this traditional correlational analysis of consistency 
of student performance across tests were corroborated with the 
criterion-referenced strategy of examining the proportions of 
examinees consistently classified into the same decision category. As 
with the correlational analyses, on the decoding subtests the 
proportions"liiere low, at an average of on 1 y^ 6?i better than chance 
agreement. On the comprehension subtests, proportions were low to 
moderate, with 57% greater than chance agreement on Literal 
Comprehension, 15?i ^less than chance agreement on Interpretative 
Thinking, and a mean 62.33?i greater than chance agreement on Meaning 
Acquisition. On the study/reference skills subtests, proportions were 
moderate to high with an average 66. 25?J greater than chance agreement. 

When inspecting the consistency of test scores displayed in 
Figures 1-4, and in Table 5,. the percentages of examinees scoring the 
same across two administrations of the BRT appear variable. There was 
no identifiable pattern -within- BRT scales; the average percentage of 
subjects scoring the same across all the subtests was 55. Given the 
fact that there are only 6 to 12 items per subtest and given a mastery 
criterion of 83% to 85% per subtest, a difference of one or two items 
correct in an administration of the BRT subtest can result in 



different mastery deci s . ons . Thus , an average of 55% of subjects 
scoring the same on two BRT administrations appears to be lower than 
desirable. . - ; 

Tlie_jresults^^ the three analyses indicate that the consistency 
of student performance on the BRT is less than adequate and that 
educators should exercise caution as they attempt, on the basis of one 
administration of the BRT, to formulate decisions concerning whether 
individual students should progress to more, difficult instructional ' 
material. While the study/reference, skills subtests may be adequate 
as a data base for making • such decisions, the decoding and 
comprehension subtests, which teachers may consider more critical„for 
formulating decisions about reading proficiency, were unreliable. 

The . criterion validity of the BRT also was examined. The 
traditional correlational analyses indicated that the criterion 
validity of the BRT with respect to the SRA Reading Achievement Test 
and the Word Reading Test was poor to fair, with correlations falling 
between .19 and .73^ Correlations on the Interpretive Thinking, 
comprehension subtest were the lowest. Statistics for the decoding 
subtests also ^.were relatively low, whereas the figures for the 
remaining, comprehension and study/reference skills subtests were 
somewhat higher. Correlations among measures of reading proficiency 
frequently have been reported at high Jevels (Fuchs, Deno, & Marston, 
in press; Fuchs, Fuchs, & Deno, . 1982) . This indicates that the 
figures for the BRT are comparatively low and that performance on the 
BRT is a relatively poor predictor of- concurrent performance on other 
measures of reading proficiency. 

• 18 . ■ . 



The criterion validity ofnhe BRT also was investigated with the 
criterion-referenced strategy of examining the relation between the 
mastery-nonmastery classification on the BRT and actual pre-post 
instVuctional status. Relatively high percentages^ of 

misclassifications (1556 to 43%) were found, suggesting limited utXity. 
of the BRT for classifying students ^nto groups for instruction within 
the basal reader for, which the BRT was designed T ^^--^^ .^^ 

Consequently, the current "study casts doubt on the re'lTabi-^ljty 
an^ validity, of the Houghton-Mifflin End-of-level 11 Basic Reading 
test, and suggests that educators use this test with caution. 
Educational tests are designed to sample an individual ' s behavior, as 
a basis for. drawing generalizations concerning his/her functioning and 
for making instructional decisions. When tests. sample behavior in 
meaningful (va^id) and, accurate (reliable) ways, they arfe useful for 
such purposes. Although criterion-referenced tests may possess high 
content and face validity, their .meaningfulness and accuracy remain 
empirical questions:, an issue frequently ignored by criterion- 
, referenced test developers. By investigating the reliability and 
validity of one cri.teri on-referenced test, the present study (a) 
documents the notion that content validity is a necessary, but 
insufficient aspect of criterion-referenced test, adequacy, and (b) 
underscores the. importance of investigating the reliability and 
validity of criterion-referenced tests as they are developed. 

The second purpose of this study was to compare the 
appropriateness and usefulness of traditional analyses with strategies 
developed specifipally for criterion-referenced tests. Findings 



15 

discussed above suggest that the two types of analyses tend to 
corroborate and enhance each - other, providing complimentary 
infor'mation. It appears that both stratejies may be* appropriate an(i 
necessary for investigating and describing the reliability and 
validity of cri i.er ion-referenced tests. 



20 



16 

References 



Baker, E. L. Beyond objectives: Domain-referenced tests for eval- 
uation and'instructional improvement. Educational Technology. 
1974, 14, 10-16. 

Berk, R. A. A consumer's' .guide to criterion-referenced test 

reliability.- Journal of Educational Measurement . 1980, 17(4), 
. 323-349. 

' ' ■ 

Brzeinski, J., & Schoephoerster, H. Basic reading tests for Images . 
Boston: Houghton-Mifflin, 1974. 

Carver, R. P. Special problems in measuring change with psychometric 
devices. Evaluation research:- Strategies and methods . 
Pittsburgh: American Institute for Research, 1970. 

Deno, S. L., Mirkin, P. K.-, & Chiang, b: Identifying valid measures 
of reading. Exceptional Children , 1982, 49(1), 36-45. 

Fuchs; L. S., Deno, S. L., & Marston, D. Improving the reliability of 
curriculum-based measures of academic skills for !>sychoeduca- 
tional decisior;) making. Diaqnostique , in press. , 

Fuchs, L. S., Fuchs, D., & Deno, S. L. Reliability and validity of 
curriculum-based informal reading inventories. Reading Research 
Quarterly , 1982, 18(1), 6-26. 

Fuchs, L. S., Wesson, C, Tindal, G., Mirkin, P. K., & Deno, S. L. 
Teacher efficiency in continuous evaluation of lEP goals 
^Research Report No. 53). Minneapolis: University of Minnesota, 

InstttTTfr'for Research on Learning Disabilities, 1981. 

Glaser, R., & Nitko, J. Measurement in learning and instruction. In 
R. L. Thorndike (Ed.), Educational measurement (2nd ed.). 
Washington, D.C.: American Council on Education, 1971. 

Qoodstein, H. A. The reliability of criterion-referenced tests and 
special education: Assumed versus demonstrated. Journal of 
Special Education . 1982, 16 (1), 37-48. 

Hambleton, R. K. Test score validity. In R. A. Berk (Ed.). 
Criterion-referenced measure ment: The state of the art . 
Baltimore: The Johns Hopkins University Press, 1980. 

Hambleton, R. K., & Novick, M. R. Toward an integration of theory and 
method for criterion-referenced tests. Journal of Educational 
Measurement , 1973, 10(3), 159-170. 



Hambleton, R. K., Swaminathan, H., Algina, J., & Coulson, D. B. 

Criterion-referenced testing and measurement: A review of tech- 
nical issues and developments. Review of Educational Research ^ 
• ' 1978, 48(1), 1-47. 

Huynh, H. On the rel,iability of decisions in domain-referenced 

testing. . Journal of Educational Measurement , 1976, 13, 253-264. 

' Millman, J. Criterion-referenced measurement. Tn W. J.. Popham (Ed.), 
Evaluation in education; Current applications . Berkeley: 
McCutchan, 1974. ^ ^ 

Naslund, R. A., Thprpe, L. P., & Lefever, D. W. SRA achievement 
Series: Reading, mathematics, and language arts . Chicago: 
Science Research Associates, 1978. 

Popham, W. J. Domain specification strategies. In R. A. Berk (Ed.), 
Criterion-referenced measurement: The state of the art . 
Baltimore: The. Johns Hopkins University Press, 1980. 

Popham, W. J.; & Husek, T. R. . Implications of triterion-referenced^ 
measurement. Journal of Educational Measurement , 1969, 6_, 1-9. 

Salvia, J., & Ysseldyke, J. E. Assessment .in special and remedial 
education (2nd ed.). Boston! Houghton-Miff Un, 1981. 

Skager, R. - The system for objectives-based evaluation— Reading. 
Evaluation Comment , 1971, 2» 6-11. 

Subkoviak, M. J. Estimating reliability from a single administration 
of a mastery test . Madison, Wis.: Laboratory of Experimental 
Design, University of Wisconsin, 1975. 



i 



22 



Footnote 

^In response to a written request for information concerning the 
technical adequacy of the test studied here, publishers described the 
field-testing that they had conducted. This response (a) alluded to, 
but failed to describe, an item analysis of test data; and (b) 
reported on a pre-posttest study in which students demonstrated an 
average growth of 8.5 grade equivalent months in 7 chronological 
months on the Gates-MacGinitie. Authors of the response stated that 
"This tends to confirm that the use of tcriteri on-referenced] 
tests. ..to oionitor effectiveness of instruction and reteaching 
contributed to an appropriate rate of progress among students." 



\ 



23 



Table 1 

Traditional and Alternative Studies of Reliability and Validity Reported In Manuals of Coianercial 
Criterion-referenced Tests and Basal Series Mastery Tests 



' T 

Reported in Test Manuals 



.Traditional (correlational) analyses Alternative (criterion-referenced] analyses 

reliability . validity ^ • ' 

inter* alternate- internal test^ construct criterion reliability validity 

rater form consistency retest , studies studies 



Diagnostic Inventory of 
Skills (1977) 

Diagnostic Inventory of 
Development (1977) 

1974) 

m and .Monitoring System 

ental Programming for infants 
jng children:' Assessment & 
ation (1977) 

rten Evaluation of Learning 
^al: A curricular approach to 
tlon (1963) 

Accomplishment Profile (1977) xfr 
Staircase (1976) 
es Master y 

1 11979) 

■81) 

-HiffHn (1974) 
resman (1981) 



es that a^ study was reported in the test manual. ^ 

■ . ,.■ -25 



ERIC 



Examinees' Tasks on the Houghton-Mifflin End-^o^f-Level 

11 Basic Reading Test 



Seal e/Subtr,st 



Examinees* Tasks 



Decoding 

Word Attack 



Pronunciation 

Comprehension 

Literal Comprehension 



Interpretive Thinking 



Meaning Acquisition 



Read a sentence from which letter(s) 
of one word have been deleted. From 
an array of three choices, circle, 
the word that most nearly sounds- Mke 
the unfinish'ed word. 

Given a word in dictionary sn'^llinq. 
select from three choices tl. word(s) 
with the same V6wel sounds the 
dictionary-spelled word. 



1. Read a factual article comprising four, 
paragraphs. Then, identify each of 12 - 
statements as either, true^or false with 
respect to information provided in the 
article. . - ' 

1. Read a paragraph,* and (a) select the 
main idea from a 'set of-statements, 
and (b) determine whether each distractor 
is not the main idea because the para- 
graph either fails to address the state- 
ment or is broader than the statement. 

1. Given a sentence with an underlined 
word and given meanings for the under- 
lined word, select the meaning that 
best' fits the sentence. 

2. Given a sentence with an underlined 
figure of speech, select from a set of 
possible statements the one best 
defining the figure of speech in the 
sentence. 

3. Given , a sentence with an underlined 
word containing a common prefix and 
given three possible.meanings, select- ■ 
the best- meaning for the underlined 
word. 



Table 2 (continue'd) 



21 



ScaTe/Subtest 



Examinee^' Tasks 



Reference/Study Skills 

Information Locating 



Information Appraising 



Information Organizing 



1. Given a book's abbreviated index and a 
set of questions, write page numbers 
of the book on which a relevant answer 
might be located for each question. 

2. Given questions and an illustration of 
'a 21-volume encyclopedia, write the 
volume number' in which relevant informa- 
tion might be located for each question. 
Then, given questions and ;a list of 

. possible subheadings for the topic! 
^ Newspaper , write the subheading in which • 
■ a relevant answer might be located for 

each question: 

' ' " . . * . 

3. Given questions and an illustration of 
a card catalog, identify the drawer in 

^ "which a relevant answer might be located 
V for each question.. Then, .given questions, 
determine whether one would search for 
an authofV title, or subject card for a 
relevant answer to each question. 

4. Given questions and a 5-columh, lO-row 
table containing information on the first 
10 presld^^^^ 

1. Identify whether statements are fact, - 
fiction, or both. 

2. Given a set of opinion stateme;nts and 

a set'of persons with biographical in- ; 
formation, match the person best qualf- . 
fied to make each opinion statement. , 

3. Identify whether or 9not a statement 
contains vague statements, and if so, 

'underline the vague statement. 

1. . Read an article. Complete a partially . 

completed outline concerning the 
. article wi th three^^l 

main topics^ subtopics, and details. 



22 



Table 3 ' 
Student Performance on Measures of Reading Achievement 



Test > . Mean . SD 

End-of-Level 11 Basic Reading Test^ 
Decoding Subtests 

Word Attack 22.5 • 3.2 

Pronunciation 17.9 6.1 

Decoding Composite 40.4 8.0 

Comprehension Subtests 

Literal Comprehension 20.2 3.8 

Interpretive Thinking 19.7 5.5 

Meaning Acquisition 62.3 11.5 

Comprehension Composite 102.2 17.8 

Study/Reference Skills Subtests 

Information Locating . 79.3 17.9 

Information Appraising ,45.1; 17.3 

Information Organizing 18.0 8.6 

Reference/Study Skill Composite 142.4 38.9 

SRA Reading Achievement Test ^ 

Vocabulary 23.4 8.6 

Comprehension 28.8 11.1 

Total 51.5 18.1 

' - c 

Word Reading Test - 

Isolated Word Reading 46.6 18.4 

Passage Reading 117.8 34.5 



Table 4 

Test-retest Reliabilities for Houghton-Mifflin End-of-level 11 

Basic Reading Test (N=20) 



Subtest 



Reliability 



Decoding Subtests 

Word Attack 
Pronunciation 
Decoding Composite 

Comprehension Subtests 

Literal Comprehension 
Interpretive Thinking 
Meaning Acquisition 
Comprehension Composite 

Study/Reference Skills Subtests 

Information Locating 
Information Appraising 
Information Organizing - 
Reference/Study Skill Composite 



.42 
.20 
.21 



.61 
,03 
.83 
.72 



.94 
.86 
.93 
.94 



Table 5 . „ 

Proportion of Subjects with Varying Percentages of Difference 
Scores Across Two Administrations of the End-of-level 11 
Basic Reading Test {N=20) 











Percentage Difference Score 










0 


.08 




. CO 


. JD 


. HO 


tOO 


^R 


.75 


.85 






to 


to 


to 


to 


to 


to 


to 


to 


to 


. to 


Basic Keaoing lest 






14 


.24 


.34 


.44 


.54 


.64 


.74 


84 


1 -n 


Decoding Subtests 
























Word Attack 


6 


85. 


0 


10 


5 


■0 


0 


0 


0 


0 - 


- ,0 


Pronunciation 


8 


45 


27 


.0 


15 


7 


6 


0 


0 


0 


0 


Comprehension Subtests 


















• • 






Literal Comprehension 


i c 




00 


15 


12 


0 


0 


0 


0 


u 


n 


Interpretive imnKing 


1 9 
1 




1 n 


10 


8 


7 


0 


0 


5 


.5 


0 


Meaning Acquisition 
























Wopas 


1.0 
1 c. 




dh 


5 


5 


0 


0 


0 


P 


0 


0 


Figures of Speech 




77 




- 0 


0 . 


0— 


--0- 


•r-'-O 


0 


0 


0 , 


Affixes 


1 9 


DO 


1 Q 


5 


5 


0 


5 


0 


6 


' 0 


0 


Study/Reference Skills Subtests 
























Information Locating 
























Index 


12 


77 


15 


8 


0 


0 


0 


■ 0 


0 


0 


o" 


Encyclopedia 


12- 


• 38 


50 


0 


6 


6 


0 


0 


0 


. 0 


0 


Card Catalog 


12 


50 


27 


17 


6. 


0 


0 


0 


0 


0 




Table 




55 


25 


20 


0 


- 0 


,0 


0 


0 


0 


0: . 


Information Appraising 
























Fact/Fiction 


12 


22 


56 


22 


■ 0 


0 


0 


0 


0 


0 


0 


: Opinion Statements 
Value Expressions 


6 


55 


0 


_i-o: 


_.25 


0 


0 




5 ^ 


0 


0 


6 


38 


0 


50 


0 


0 


12 


0 


0 


0 


0 


^ Information Organizing 


12 


75 


10 


5 


' 5 


5 


0 


" 0 


0 


0 


0 ' 



^Number of items on the test . 



25 



- Table 6 

Uncorrected and Corrected Proportion of Examinees (N=18) Placed 
Into the Same Decision Categories on Two Administrations 
of the End-of-level 11. Basic Reading Test 



Basic Reading Test 



Decoding Subtests 

Word Attack 
Pronunciation • 

Comprehension Subtests 

Literal Comprehension 
Interpretive Thinking 
Meaning Acquisition 
Words 

Figures of Speech 
1. Affixes - 

Study/Reference Skills Subtests 

Information Locating 
Index ■ - 
Ericyclppedia 
Card Catalog 
Table 

Information Appraising 

Fact/Fiction 
. Opinion Statements 

Value Expressions 
Information Organizing 



Proportion of Examinees 

Corr,ected for ghance 



Uncorrected 



Agreements 



.89 
.61 



,.83 
.72 

.72 
.89 
.94 



.89 
.89 



.83 
.89 

.89 
.89 
.78 
.89 



.06 
.18' 



•i57 
.15 

.31 
.68 
.88 



.68 
.72 

'\M 
.68 

,68 
.78 
.51 
.78 



^Observed - Chance Proportions/Maximum Value that (Observed- 
Proportions) Can Assume. 



Chance 



26 

Table 7 

•Correlations Between Basic. Reading Test and SRA Test Scores (N=42) 



Basic Reading Jest 



Vocabulary 



SRA 

Comprehension 



Total 



Decoding Subtests 

Word Attack 
Pronunciation 
Decoding Composite 

Comprehension Subtests 

Literal Comprehension 
Interpretive Thinking 
Meaning Acquisition 
Comprehension Composite 

Study/Reference Skills Subtests 

Information Locating 
Information Appraising 
Information Organizing 
. Reference/Study Skill Composite 



.40 
.42 
.'48 



,52 
,35 
,73 
,70 



,67 
,58 
,54 
,69 



.38 
.44 
.49 



.61 
.19 

.70 
.64 



,63 
,55 
,47 
,63 



,40 
,43 
,49 



,57 
.26 
,75 
,69 



,65 
,53 
,51 
,65 



Table. 8 

Correlations Between Basic Reading Test and Word Reading 
Test Scores (N=46) 





Word Reading Test 




Isolated Words 


Passage 


Decoding Subtests 






Word Attack 


.27 


.ol 


Pronunciation 


.33 


A C 


Decoding Composite 


.36 


. 4/ 


Comprehension Subtests 


ii 




Literal Comprehension 


.41 


.50 


Interpretive Thinking 


.33 


.37 


Meaning Acquisition 


. . 55 


.67 


Comprehension Composite 


.55 


.66 


Study/Reference Skills Subtests 






Information Locating 


.53 


. 64 


Information Appraisijrig 


_._48-_- - 




Information Organizing 


.52 


.57 


Reference/Study Skil 1 s Composite 


.57 


.68 


Total Test Score 


.57 ^ 


.65 



Table 9 

Relation Between Houghton-Mifflin Basic . Reading Tests 
and Criterion Classification (N=46) 



* ■ 


0 




Percentage 


Basic Reading Tests 


u 

X 


J2jvalue 


Misclassified 


Uecoo 1 n y oUDueb ts 








WOrO nttai-N 


2.3 


.15 


43 


Pronunciation 


1.8 


.22 


38 , 


ifproHina Comoosite 


5.1 


.03 


32 


Comprehension Subtests 




n 




Literal Comprehension 


1 . 3 




40 


Interpretive Thinking 


.8 


.40 


43 


Meaning Acquisition 


4.6 


.04 


34 - 


Comprehension Composite 


5.1 


.03 


32 


Study/Reference Skills Subtests 








Information Locating . 


5.4 


.02 


^32 


^fo rma^ti on -Acqu i ri n g~:^^ ^^^^-^^^ 


^.,2Q.Z=..„ 


^.<,001.._ 


^..J.5 


Information Organizing 


11.5 


' "<.001 


.23 


Reference/Study Skills Composite 


11 .7 


<.001 


23 ■ 



34 



29 



e 

Id 
X 



o 



90 J 
80 }\ 
70 
60 
50 

40 



0) 

u 



ex. 



30 
20 
10 



o 



CO 

CO 



T — r 



U3 



n 

00 



Difference in Percentage of items correct on 
Word Attack Subtest 



to 

C 



c 

U 



80 i 
70 



60 



50 
40 

30 i 
20 
10 A 



0 • 



I I 

in m 

cvu m r^. 

r— CM CO 



CVJ 



m 

r^. 00 o 



Difference in percentage of items correct on two 
administrations on Pronunciation Subtest 



Figure 1. Displays^ of consistency of test scores on decoding subtests 
of end-of-Tevel 11 BRT. 



30 



c 

s 



01 



U 
.1. 
4» 




Difference In percentage of Items correct 
on Literal Comprehension Subtest 



Difference In percentage of Items correct 
on Meaning Acquisition-Words Subtest 



8 

c 

K 



01 

o> 
c 

01 

t 

01 




I I — "3 — 3: 
00 in cvi o 00 

o r- «VJ CO *3- Ln in 



fs. uS 



00 



"T — I r- 

in in 
c>4 Lo r*» o csj 
^ cj ro in to 



T — r 



r*» 00 



Difference In percentage bfTltems* correct 
on interpretive Thinking Subtest 



Difference In percentage of Items correct 
on Meaning Acquisition - Figures of Speec* 
Subtest 




Difference In percentage of ^^^ms correct 
on Mean1ng\Acqu1s1t1on-Aff1x Subtest 



Figure 2. ITfsplays of consiTtencyW~f^sf~scores on comprehension subtests 
of end-of-level 11 BRT. 




31 



e 



PI, 
m 

C 

u 
a. 




-1 — 
cn 



00 
LT) 



lo in 



Difference In percentage of Items 
correct on Information Locating - 
Index 



(/I 

0) 

c 

•r- 

s 

O 

cn 

;3 

c 

t! 



80 
70 
GO 

50 • 
40- 
30- 
20 

10 
0 



r- 

o 



8r^ to 
r- CM 



CO CSJ 
CO ^ 



CO yo m 
if} lo. r> 



Difference In percentage of items 
correct on Information Locating • 
Card Catalog 




Difference in percentage of Items 
correct on Information Locating - 
Encyclopedia 




Difference In percentage of Items 
correct on Information Locating - 



sistency of test scores on study/reference 



Pi mire 3 Disolays Of consistency or test sLore:> un 
Figure 3. Display .^^^^^^^^^^ ^^.^^^^g subtestslof end-of-l evel 



32 




Difference 1n percentage of Items Difference 1n percentage of Items 

correct on Information Appraising- correct on Information Appraising - 

Fact/Fiction . Vague Expressions 



I/) 

c 

K 



01 

c 

01 

u 




0) 



E 



01 

cn 
to 

■M 

c 
u 

L. 



80- 

70 

60. 

50- 
40 
30 
20 H 
10 
0-1 




cj ri 
o o f-> 



(VJ 



Difference In percentage of Items 
correct on Information Appraising 
Evaluation Statements 



Difference 1n percentage of items 
correct on Information Organizing 



Figure 4. Displays of consistency of test scores on study/reference 
skills, information appraising and information organizing 
, subtests of end-of-level 11 BRT. 



38 



PUBLICATIONS 



Institute for Research on Learning Disabilities 
University of Minnesota 



The Institute Is hot funded for the distribution of Its publications. 
Publications may be obtained for $4 «00 each, a fee designed to cover 
printing and postage costs. Only checks and money orders payable- to 
the University of Minnesota can be accepted. ^ All orders must be pre- 
paid. Requests should be directed to: Editor, IRLD, 350 Elliott Hall; 
75 East River Road. University of Minnesota, Minneapolis > MN 55455 . 

The publications listed here are only those that have been prepared 
since 1982. For a completey annotated list of all IRLD publications, 
write to the Editor. 

Wesson, C, Mlrkln, P., & Deno, S. Teachers' use of self Instructional 
materials for learning procedures for developing and monitoring 
progress on lEP goals (Research Report No. 63), January, 1982, 

Fuchs, L., Wesson, C, Tlndal, G., Mlrkln, P., & Deno, S. Instructional 
changes, student performance, and teacher preferences; The effects 
of specific measurement and evaluation procedures (Research Report 
No. 64). January, 1982. 

Potter, M. , & Mlrkln, P. Instructional planning and Implementation 
practices of elementary and secondary resource room teachers: 
Is there a difference? (Research Report No. 65). January, 1982. 

Thurlow, M. L., & Ysseldyke, J. E. Teachers' beliefs about LP students 
(Research Report No. 66). January, 1982. [ 

Graden, J., Thurlow, M. L., & Ysseldyke, J. E. Academic engaged time 
and Its relationship to learning: A review of the literature 
(Monograph No. 17). January, 1982. 

King, R. , Wesson, C., & Deno, S. Direct and frequent measurement of 
student performance: Does It take too much time ? (Research 
Report No. 67). February, 1982. * . X 

Greener, J. W., & Thurlow, M. L. Teacher opinions about professional 
education training programs (Research Report No. 68). March, 
1982. ■ ^ ■ 

Alg0Z2:liie7'Br,"& Ysseldyke, J. Learning disabilities as a subset of 
school failure; The oversophlstlcatlon of a concept (Research 
Report No. 69). March, 1982. '~7~. 

Fuchs, D., Zem, D. S., & Fuchs, L. S. A microanalysis of par ticipant 

behavior In familiar and unfamiliar test condi tions fReBearrh 

Report No. 70) . March, 1982. ~ ' " 



39 



Shlnn, M. R. , Yaseldyke, J., Deno, S., & Tindal, G. A comparison of 
PBychomotrlc and functional dlfferoncea betwocn studonta labolod 
learning disabled and low achlcvlnR (Rescorch Report No. 71). 
March, 1982. 

Thurlow, M. L. Graden, J., Greener, J. W., & Ysseldyke, J. E. Academic 
responding time for LP and non-LD students (Research Report No. 
72). April, 1982. 

Graden, J., Thurlow, M., & Ysseldyke, J. Instructional ecology and 

academic responding time for students at three levels of teacher- 
perceived behavioral competence (Research Report No. 73). April, 
...••1982. 

Algozzlne, B., Ysseldyke, J., & Chrlstenson, 5, The Influence of 
teachers' tolerances for specific kinds of behavlora on their 
ratings of a third grade student (Research Report No. 74). 
April, 1982. ■ 

Wesson, C, Deno, S., & Mlrkln, P. Research on developing and monitor- 
ing progress on lEP goals; Current findings and Implicat ions for 
practice (Monograph No. 18). April, 1982. , 

Mlrltln, P., Marston, D. , & Deno, S. L. Direct and repeated measurement 
of academic skills; An alternative to traditional screening, re- 
ferral, and Identification of learning disabled students (Research 
Report No. 75). May, 1982. ^, 

Algozzlne, B., Ysseldyke, J. Chrlstenson, S., & Thurlow, M. • Teachers' 
Intervention choices for children exhibiting different behaviors 
In school (Research Report No. 76). June, 1982. 

^-.^ & Ysseldyke, J. E. Learning disabilities; 
The experts sp6ak out (Research Report No. 77). June, 1982. 

Thurlow, M. L., Ysseldyke, J. E. , Graden, J., Greener, J. W., &. 

Mecklenberg, C. Academic responding time for LD students receiving 
different levels of special education services (Research Report 
No. 78). June, 1982. , . 

Graden, J. L., Thurlow, M. L., Ysseldyke, J. E., & Algozzlne, B. Instruc 
tlonal ecology and academic responding time for students In differ- 
ent reading groups (Research Report No. 79), July, 1982. 

Mlrkin, P. K. , & Potter, M. L. ' A survey of program planning and Imple- 
mentation practices of LP teachers (Research Report No. 80). July, 
1982. 

Fuchs, L. S., Fuchs, P., & Warren, L. M. Special education practice 
In evaluating student progress toward goals (Research Report Np. 
81). July, 1982. , 

Kuehnle, K. , Peno, S. l". , & Mlrkln, P. K. Behavioral Measurement of 
social ad.lustment; What behaviors? What Betting A (Research 
. Report No. 82). July, 1982. . 



Fuchs, D., Daiiey, Ann Madsen, & FuchsJ L/S Examiner familiarity and 
t he relation between qualitative and quantitative Indices of ex- 
pressive languajge (Research Report Noy 83) > July; 1982, 

Vldee n, J, , D^b, S , ,, & jlarstoh. D« CorriectJ tford sequences ; A valid _ 

"" indicator of :proficlency In written expression (Regearcti Report 
^ No. 84)/ July, 198^^^ \ 

Potter , M . L ; Application of a decision theory iaod^l to eligibility 

and classification decisions In special education' (Research Report- 

■ •> ,'^;Np,. ,85) July,, 1982 • v: ;;:■../■•.••>, •,, ■ )^^\'': ^^'-'V;;'^ 

Greener, J., Thurlow, M. L., Graden, J; t;, & Yeseldyke, J;^^ 

educlfeional environment and students' respohdlng times as a function 
of students* teacher-percelved academic competence (Research Report . 
Noi 86). August, 1982. : . . ;""'^\y :■ ;V ; ; 

i)eno, S. , Marston, D. , Mlrkln, P. , Ix>wry , L. , Sindeiar^ P. , &^^^^J^ 

The use of standard tasks to measure achievement, in readinfe^ speU 
and written expression; A; normative and developmental study (Research 
Report No. 87) . August, 1982. v :'\r-:\A''-::j:'\.'--^ 

Sklba, R. , Wesson, C. , & Deho , . The effects of training 'teachers in V 
the use of formative evaluation in reading! An experlmehtalrcontrol / 
comparison (Research Report No ; 88) , Septembeir y 1982 .^^^ V ' 

Marston, D. , Tlndal, G., & Deno, S. L. Elljglblllty for learnings dlsa^ y 

blllty services; A direct and repeated measurement ^ • ^ -V 

(Research Report No. 89) . September, 1982. ' 

T^iurlow, M. L., Ysseldy^, J. E. , & Graden, J. L. LP students' active .■^■k:;.\ 
academic responding In regular and resource classrooms (Research 
Report No. 90). September,. 1982^ ^^VV^^^^^ ^^^^^^^^^^^ ^ ^^;^> r > . 

Ysseldyke, J. E., Christenson,- S . P Thurlow, M. L. , & Algozzine, 

B. An analysis of current practice in referring students for psycho--' 
educational evaluation ; Implications for change (Research Report No • 
91). October, 1982. - ; ; V , - ' ; 

Ysseldyke, J. E. , Algozzine, B. , & Epps, St A logical and empltlcj^l ' 

analysis of current practices in classifying studentis as handicapped ; 
(Research Report No. 92) . October, 1982; . v ■ 

Tindal, G., Marston, D., Deno, Si L., & Germatirii G; . Curriculum differ- 

. -ences'-in-direct— repeated-mea 

93) . ■ October, M982. ■■r:;/-^^:^-^' 

Fuchs, L.S . , Deno, ; Si L. , & Marston,^ D^ Use of aggregation to improve ; : ; ; 
the reliability of simple direct measures of acadetaic performance • 
(Research Report; No V 94). pct^^^ 

Ysseldyke , J . E; , lliurlG^ , Mi^ I- . , ^Mecklenburg ^ Ci^ * & (J^ , • J . Observed 
changes in instruction > and- stu as'v a function of ^ ' ■ S;: 

; . referral 4nd spec ial educat ion placement (Research Report No • 95) . ; ; > 



Fuchs, L. S.-i Deno, S. L., is< Mlrkln," Pi K. Effects of frequent currlcu- 
lum-baaed measurement and evaluation on student achievement and 
knowledge of performance; An experimental study (Research Report 
No. 96). November, 1982. 



Fuchs, L. S., Deno; S. T.., & Mlrkln. P. K.- Direct and frequent measure^ 
ment and evaluation; Effects oh Instruction and estim ates -of - 
student progress "(Research Report N9,. 97). November, 1982. 

Tindalj G. , Wesson, C. , Germann, G. , Deno, S. L. , & Mirkin,^^^^^^^^^^^^^ Th£ 
Pine douhty model for special education delivery; A data-based :' 
system (Monograph No . 19) , November, 19&2 . . , ' 

Epps ,. ai , Ysseldyke , J . j: . , & Algozzine, . B . An analysis of the conceptual 
fratnework underlying definitions of learning disabilities (Research 
Report No. 98) . November, 1982. 

Epps, S., Ysseldyke, J. E., &. Algozzine, B.. Public-Policy, implications 
of different definitions of learning diaabllities (Research Report 
No; 99). November, 1982. 

Ysseldyke, J. E., Thurlow, M. L., Graden, J. t.. Wesson, C, Deno^ S. L., 
& Algozzine, B. Generalizations from five yea ra of research on 
assessment and .decision making (Research Report No. 100). November, 

•1982..- -it ■ ^ ,iV ■ 

Harston D., &■ Derio, S.'iL. Measuring academic pr ogress of students with 
learning difficulties; A comparison of t he semi-logarithmic chart 
and equal interval graph paper (Research Report No . 101) , November , 
1982.-- .... . ■■ ,, ' ■ •;. 

Beattie, S., Grise,, P., & Algozzine, B> Effects of ^tegt modifications 
on minimum competency test performance of t hird gra de. learning 
disabled students (Research Report No. 102). December, 1982 - 

Algozzine, B., Ysseldyke, J. E., & Christenson, S. An analysis of the 
incidence" of special class placement: The masses are burgeoning 
(Research Report No. 103). December, 1982. 

Ma^^, D., Tindal, G., & Deno, S. L. Predictive efficiencv^of direct, 
repeated measurement; An analysis o f cost and accuracy in classi- 
ticat.ton. (Research Report No. 104) . December, . 1982. 

Wesson, C., Deno, ■.S.^ Mlrkin,.. P.," Sevcik.'BV, Ski^^^^^ King, R.,^ : 
Tindal, G., ^ M«rnvama.G. Teaching S tructure and student achieve- 
„„_ ^ment-ef f ects -of -curr iculum^based^measur ement,;__A causal ( structural) 
analysis (Research Report-; No . 105) . December, 1982 • ■ 

Mirkin, P. K^, .Fuchs, L. S., & Deno, S. L. (fids.). Cohsiderat ions for 
^. designing a continuous evaluation s vstem; An integrative review • 
(Monograph ^q. 20). December, 1982. 

Mkr«ton, D. , & Deno . S J , L . Implementat ion of di rect and repeated 

measurem en t in the school setting (Research Report No.. 106) . _ 

December, 1982. ' ' V \ '.^^ 



. . Deno, S. L. , King, R. , Skiba, R. , Sevcik, B., & Wesson, C. The structure ' 
of Instruction rating scale (SIRS); Development and technical 
characteristics (Research Report No. 107)'. January , 1983 . 



Thurlow, M. L., Ysseldyke, J. E./ & Casey. A. Criteria for identifying, v 
LP students: Definitional problems exemplified (Research Report 
No. 108) . January, 1983. ; • 

Tindal, C, Marston, D., & Deno, S.. L. The reliability of direct and : 
repeated measurement (Research Report No . 108) .February, 1983 . 

Fuchs,' D., Fuchs, L. S., Dailey, A. M, , & Power, M. H. Effects of pre- 
test contact with experienced and inexperienced examiners on handi- 
capped children's performance (Research Report No.' 110) . February, ,; 
- 1983 : . ' : 

King, R. P., Deno, S., ^^^rkin, P. , & Wesson^ C. The effect s of training 
teachers in the use of formative evaluation in reading! An experi- 
mental-control comparison (Research Report No. Ill) • February, 1983. 

Tindal, C, Deno, S. L., & Ysseldyke, J. E. Visual analysis of time 

series data; Factors of influence and level of reliability (Research 
Report No. 112). March, 1983. 

Tindal, G, Shinn, M. , Fuchs, L., Fuchs, . D. , Deno, S. , & Germann, G. ^le 
^ technical adequacy of a basal reading series mastery test (Research 
Report No. 113). April, 1983. 



